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WHAT TS CLAIMED IS : 

1 1 . A processor comprising: 

2 an execution unit to execute instructions; 

3 a replay system coupled to the execution unit to replay instructions which have not executed 

4 properly, the replay system comprising: 

5 a checker to determine whether each instruction has executed properly; and 

6 a plurality of replay queues, each replay queue coupled to the checker to temporarily 

7 store one or more instructions for replay. 

1 2. The processor of claim 1 wherein said plurality of replay queues comprises: 

2 a first replay queue coupled to the checker to temporarily store instructions corresponding 

3 to a first thread; and 

4 a second replay queue coupled to the checker to temporarily store instructions corresponding 

5 to a second thread. 

1 3. A processor comprising: 

2 an execution unit to execute instructions; 

3 a replay system coupled to the execution unit to replay instructions which have not executed 

4 properly, the replay system comprising: 



5 



a checker to determine whether each instruction has executed properly; and 
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6 a replay queue coupled to the checker to temporarily store one or more instructions 

7 of a plurality of threads for replay, the replay queue partitioned into a plurality of sections, each 

8 section provided for storing instructions of a corresponding thread. 

1 4. The processor of claim 3 wherein said plurality of replay queue sections comprises: 

2 a first replay queue section coupled to the checker to temporarily store instructions 

3 corresponding to a first thread; and 

4 a second replay queue section coupled to the checker to temporarily store instructions 
1 corresponding to a second thread. 

1 5. The processor of claim 3 wherein said replay system further comprises: 

2 a replay loop to route an instruction which executed improperly to an execution unit for 

3 replay; and 

4 a replay queue loading controller to determine whether to load an improperly executed 

5 instruction to the replay loop or into one of the replay queue sections. 

1 6. The processor of claim 3 and further comprising: 

2 a scheduler to output instructions; and 

3 a multiplexer or selection mechanism having a first input coupled to the scheduler, a second 

4 " input coupled to the replay loop and a plurality of additional inputs, each additional input coupled 

5 to an output of one of the replay queue sections. 
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7. The processor of claim 3 wherein each said replay queue section comprises a replay queue 
section coupled to the checker to temporarily store one or more long latency instructions of a thread 
until the long latency instruction is ready for execution. 

8. The processor of claim 3 wherein each replay queue section comprises a thread-specific 
replay queue section coupled to the checker to temporarily store an instruction in which source data 
must be retrieved from an external memory device, the instruction being unloaded from the replay 
queue section when the source data for the instruction returns from the external memory device. 

j 9. The processor of claim 3 wherein said execution unit is a memory load unit, the processor 

further comprising: 

a first level cache system coupled to the memory load unit; 

a second level cache system coupled to the first level cache system; and 

wherein the memory load unit performs a data request to external memory if there is a miss 

on both the first level and second level cache systems. 

10. The processor of claim 9 wherein a load instruction of a thread will be loaded into a 
replay queue section corresponding to the thread when there is a miss on both the first level and 
second level cache systems for the load instruction, and the load instruction is unloaded from the 
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4 replay queue section corresponding to the thread for re-execution when the data for the instruction 

5 returns from the external memory. 

1 11. A processor comprising: 

2 a multiplexer having an output; 

3 a scheduler coupled to a first input of the multiplexer; 

4 an execution unit coupled to an output of the multiplexer; 

5 a checker coupled to the output of the multiplexer to determine whether an instruction has 

6 executed properly; 

7 a plurality of thread-specific replay queue sections to temporarily store instructions for each 

8 of a plurality of threads, an output of each of the replay queue sections coupled to additional inputs 

9 of the multiplexer; and 

] o a controller coupled to the checker to determine when to load an instruction into one of the 

1 1 replay queue sections and to determine when to unload the replay queue sections. 

1 12. The processor of claim 1 1 and further comprising a staging section coupled between the 

2 checker and a further input to the multiplexer to provide a replay loop, the controller controlling the 

3 multiplexer to select either the output of the scheduler, the replay loop or an output of one of the 

4 replay queue sections. 
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1 13. The processor of claim 1 1 wherein the controller determines when to unload one or more 

2 of the replay queue sections based on a data return signal. 

1 14. A method of processing instructions comprising: 

2 dispatching an instruction where the instruction to an execution unit and to a replay system; 

3 determining whether the instruction executed properly; 

4 if the instruction did not execute properly, then: 

5 determining whether the instruction should be routed back for re-execution or whether 

6 the instruction should be temporarily stored based on a thread of the instruction. 

1 1 5. A method of processing instructions comprising: 

2 dispatching an instruction where the instruction is received by an execution unit and a replay 

3 system; 

4 determining whether the instruction executed properly; 

5 if the instruction did not execute properly, then: 

6 routing the instruction to the execution unit for re-execution if the instruction is a first 

7 type of instruction; 

g otherwise, loading the instruction into one of a plurality of thread-specific replay 

9 queue sections based on a thread of the instruction if the instruction is a second type of instruction. 
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16. The method of claim 15 wherein the first type of instruction comprises a short latency 
instruction, and the second type of instruction is a longer latency instruction. 

1 7. A method of processing instructions comprising: 
initially allocating execution resources for multiple threads; 

determining that a first thread has stalled; 
/ temporarily storing one or more instructions of the first thread in a queue; and 
continuing to allocate execution resources to other threads which have not stalled 

18. The method of claim 17 wherein said step of continuing to allocate comprises the step 
of continuing to allocate execution resources to other threads which have not stalled and inhibiting 
the allocation of further resources to the stalled thread by temporarily storing the stalled thread 
instructions in the queue. 

1 9. The method of claim 1 7 wherein priority for execution resources are allocated to the other 
threads which have not stalled on a rotating priority basis. 

20. The method of claim 17 and further comprising the steps of: 
detecting that the first thread is no longer stalled; 

unloading the one or more instructions of the first thread from the queue; and 
re-allocating at least some execution resources to the first thread. 
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21 . The method of claim 17 wherein the step of determining that a first thread has stalled 
comprises detecting a long latency or agent instruction for the first thread. 
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